Skip to content

Handling edge cases: wtq-00080, wtq-00085, wtq-00113, wtq-00162, wtq-00168#10

Closed
P6rguVyrst wants to merge 9 commits into
ZON-Format:mainfrom
P6rguVyrst:main
Closed

Handling edge cases: wtq-00080, wtq-00085, wtq-00113, wtq-00162, wtq-00168#10
P6rguVyrst wants to merge 9 commits into
ZON-Format:mainfrom
P6rguVyrst:main

Conversation

@P6rguVyrst
Copy link
Copy Markdown

@P6rguVyrst P6rguVyrst commented Apr 14, 2026

The failing tests: #9

If the accuracy tests pass (close PRs)-- I'm wrong, but I have no way to validate them.

wtq-00080 Example

  • Original Notes: "Run under FIA rules as the Great Lakes Rally, jointly with Club Automobile du Burundi"
  • Decoded Notes: "Run under FIA rules as the Great Lakes Rally"

## 3 Distinct bugs

1 Fix boolean-like dictionary keys being converted to booleans

The encoder now quotes keys that match boolean/null keywords (t, f, true,
false, null, none, nil) to prevent the decoder from misinterpreting them.

The decoder now uses a new parse_key() function for dictionary keys that
preserves them as strings, unlike parse_value() which converts keywords.

This fixes a critical round-trip bug where {"f": 1} would decode as
{False: 1}.

2 Fix dictionary header values not being quoted

Values in dictionary compression headers containing commas, colons,
or other special characters were not being quoted, causing data
corruption on decode (values truncated at delimiter).

Now uses _format_value() for consistent quoting of dictionary values.

3 Exclude float columns from delta encoding

Delta encoding violated spec §2.3's MUST round-trip requirement for float
columns: prev + (cur - prev) in IEEE-754 does not recover the original
double's bit pattern for arbitrary values, and round(diff, 10) at the
encode step compounded the loss. Benchmark data exposed this as e.g.
1865.43 decoding to 1865.4299999999994.

Restrict SparseMode.DELTA eligibility to int-only columns. Float columns
now fall through to standard value encoding, which round-trips exactly
via Python's shortest-round-trip str(float). Int delta encoding (the
common case: IDs, counts, timestamps) is unchanged.

Regression test covers multiple precision regimes — benchmark values,
math.pi/math.e, 0.1+0.2, extreme exponents, negatives — so a future
round-to-N workaround cannot sneak through.

Toomas Ormisson and others added 4 commits April 12, 2026 22:42
The encoder now quotes keys that match boolean/null keywords (t, f, true,
false, null, none, nil) to prevent the decoder from misinterpreting them.

The decoder now uses a new parse_key() function for dictionary keys that
preserves them as strings, unlike parse_value() which converts keywords.

This fixes a critical round-trip bug where {"f": 1} would decode as
{False: 1}.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Values in dictionary compression headers containing commas, colons,
or other special characters were not being quoted, causing data
corruption on decode (values truncated at delimiter).

Now uses _format_value() for consistent quoting of dictionary values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Warning

Rate limit exceeded

@P6rguVyrst has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 44 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 11 minutes and 44 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 184522f7-f7c4-4630-a4be-58977e83819f

📥 Commits

Reviewing files that changed from the base of the PR and between 3ed82e4 and 2df1b31.

📒 Files selected for processing (5)
  • zon-format/src/zon/core/decoder.py
  • zon-format/src/zon/core/encoder.py
  • zon-format/src/zon/core/utils.py
  • zon-format/tests/unit/test_boolean_keys.py
  • zon-format/tests/unit/test_delta.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@P6rguVyrst P6rguVyrst changed the title wtq-00080, wtq-00085, wtq-00113, wtq-00162, wtq-00168 Handling edge cases: wtq-00080, wtq-00085, wtq-00113, wtq-00162, wtq-00168 Apr 14, 2026
@ronibhakta1
Copy link
Copy Markdown
Contributor

I have a 10$ Claude api token unused for a long time. If you want me to send it over you just let me know your email.
What you can do is add a script which reuses the existing script but it points towards your claude model. and if you want your ollama model too.

P6rguVyrst and others added 4 commits April 19, 2026 14:36
Delta encoding violated spec §2.3's MUST round-trip requirement for float
columns: prev + (cur - prev) in IEEE-754 does not recover the original
double's bit pattern for arbitrary values, and round(diff, 10) at the
encode step compounded the loss. Benchmark data exposed this as e.g.
1865.43 decoding to 1865.4299999999994.

Restrict SparseMode.DELTA eligibility to int-only columns. Float columns
now fall through to standard value encoding, which round-trips exactly
via Python's shortest-round-trip str(float). Int delta encoding (the
common case: IDs, counts, timestamps) is unchanged.

Regression test covers multiple precision regimes — benchmark values,
math.pi/math.e, 0.1+0.2, extreme exponents, negatives — so a future
round-to-N workaround cannot sneak through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@P6rguVyrst P6rguVyrst closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants